Data-Guided Controllability: Learning from the Human Genome

نویسندگان

  • G. Patterson
  • A. Surana
  • M. Mesbahi
  • S. Smale
  • A. Bloch
  • I. Rajapakse
چکیده

We develop a framework for the control of dynamical systems that we know very little about. From limited, high-dimensional data observations, we approximate the natural dynamics of the system and investigate the controllability of the identified equations. The linear discrete-time-invariant model identification is done using dynamic mode decomposition (DMD), which is motivated by Koopman operator theory and can capture, from data, underlying nonlinear coherent structures. Viewing the identified linear model as a network, we propose three “control evaluations” to recommend intelligent control placements for steering the system to a specific target. The motivating application for this work is the control of genetic networks, in the context of cell reprogramming. We show that our targeted control evaluations identify known cell reprogramming strategies better than existing, general control metrics. Our framework is a first step toward using mathematics, guided by data, to control the genome. When the governing equations of a dynamical system are known or can be well approximated, there are effective methods to study the controllability of that system. However, data limitations in complex dynamical systems that are difficult or resource-intensive to observe make it challenging to generate adequate governing equations. Biological systems in particular have limited data, thus it has been difficult to acquire insight into controllability. The motivating application and focus for this work is the control of genetic networks, specifically in the context of cell reprogramming. While it is generally accepted that the human genome is a dynamical system [1, 2], the dynamic properties of the genome are little understood, despite increasingly advanced technologies and assays (e.g. RNA-seq, ChIP-seq, Hi-C). Nevertheless, Weintraub et al. demonstrated control over that system with the reprogramming of fibroblasts (FIB) into muscle cells via the introduction of a single transcription factor, MyoD [3]. In addition, Yamanaka et al. (2007) successfully reprogrammed a human FIB into an induced pluripotent stem cell (iPSC). With information regarding merely the initial state (FIB) and the desired final state (embryonic stem cell, ESC), Yamanaka and colleagues predicted and determined experimentally that the system can be driven from one cell type to another using just the four transcription factors OCT4, SOX2, KLF4, and MYC (all four abbreviated OSKM). In the same spirit, our aim is to formalize and validate mathematically the findings of Yamanaka. Using function (RNA-seq) and form (Hi-C) data, as well as binding location data for 222 transcription factors (TFs), our goal is to recommend mathematically which TFs are the best candidates to reprogram FIB into another cell type. More generally, we present methodology for making inferences about the controllability of a dynamical system with respect to a specific target, and when the precise dynamic equations are unknown and only minimal data is available. We present the proposed framework in a general context, followed by the application to controlling genetic networks. We show that ...

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-38: Chromosome Instability in The Cleavage Stage Embryo

Recently, we demonstrated chromosome instability (CIN) in human cleavage stage embryogenesis following in vitro fertilization (IVF). CIN not necessarily undermines normal human development (i.e. when remaining normal diploid blastomeres develop the embryo proper), however it can spark a spectrum of conditions, including loss of conception, genetic disease and genetic variation development. To s...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis

Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015